We study the composition style in deep image matting, a notion that characterizes a data generation flow on how to exploit limited foregrounds and random backgrounds to form a training dataset. Prior art executes this flow in a completely random manner by simply going through the foreground pool or by optionally combining two foregrounds before foreground-background composition. In this work, we first show that naive foreground combination can be problematic and therefore derive an alternative formulation to reasonably combine foregrounds. Our second contribution is an observation that matting performance can benefit from a certain occurrence frequency of combined foregrounds and their associated source foregrounds during training. Inspired by this, we introduce a novel composition style that binds the source and combined foregrounds in a definite triplet. In addition, we also find that different orders of foreground combination lead to different foreground patterns, which further inspires a quadruplet-based composition style. Results under controlled experiments on four matting baselines show that our composition styles outperform existing ones and invite consistent performance improvement on both composited and real-world datasets. Code is available at: https://github.com/coconuthust/composition_styles
translated by 谷歌翻译
Novel artificial intelligence (AI) technology has expedited various scientific research, e.g., cosmology, physics and bioinformatics, inevitably becoming a significant category of workload on high performance computing (HPC) systems. Existing AI benchmarks tend to customize well-recognized AI applications, so as to evaluate the AI performance of HPC systems under predefined problem size, in terms of datasets and AI models. Due to lack of scalability on the problem size, static AI benchmarks might be under competent to help understand the performance trend of evolving AI applications on HPC systems, in particular, the scientific AI applications on large-scale systems. In this paper, we propose a scalable evaluation methodology (SAIH) for analyzing the AI performance trend of HPC systems with scaling the problem sizes of customized AI applications. To enable scalability, SAIH builds a set of novel mechanisms for augmenting problem sizes. As the data and model constantly scale, we can investigate the trend and range of AI performance on HPC systems, and further diagnose system bottlenecks. To verify our methodology, we augment a cosmological AI application to evaluate a real HPC system equipped with GPUs as a case study of SAIH.
translated by 谷歌翻译
大型变压器模型在各种自然语言处理(NLP)任务上显示出令人鼓舞的性能。尽管AI社区已将模型量表扩展到了万亿个参数级别,但由于延迟,吞吐量和内存约束,仍不确定100亿参数模型的实际部署。在本文中,我们提出了Energonai,以解决单个或多GPU系统上有效部署1000亿参数变压器模型的挑战。 Energonai采用层次结构控制器系统体系结构来协调多个设备并有效支持不同的并行模式。它将子模型的执行委托给单个控制器样式的多个工人,并以多控制器样式的工人之间的工人之间的张量并行性和管道并行性。在新的架构上,我们提出了三种技术,即非阻滞管道并行性,分布式冗余计算消除和同行记忆池。 Energonai使用户能够编程复杂的并行代码与串行编码相同。与FertransFormer相比,我们已经证明,Energonai在延迟和吞吐量方面具有较高的性能。在我们的实验中,Energonai可以在张量并行性,管道并行性的10%可伸缩性中实现37%的潜伏期降低,并通过使用较大的异质记忆空间以有限的性能降低的成本来提高对单个GPU推断的模型量表。
translated by 谷歌翻译
自我监督的预训练技术在文档AI中取得了显着进步。大多数多模式的预训练模型都使用蒙版的语言建模目标来学习文本模式的双向表示,但是它们在图像模式的预训练目标方面有所不同。这种差异增加了多模式表示学习的困难。在本文中,我们建议\ textbf {layoutlmv3}为文档AI预训练多模式变压器,并具有统一的文本和图像掩蔽。此外,LayoutLMV3通过单词斑点对齐目标进行了预训练,可以通过预测是否掩盖文本的相应图像贴片来学习交叉模式对齐。简单的统一体系结构和培训目标使Layoutlmv3成为以文本为中心和以图像为中心的文档AI任务的通用预培训模型。实验结果表明,LayoutLMV3不仅在以文本为中心的任务中实现最先进的绩效,包括形式的理解,收据理解和文档视觉问题回答,而且在以图像为中心的任务(例如文档图像分类和文档布局)中分析。代码和模型可在\ url {https://aka.ms/layoutlmv3}上公开获得。
translated by 谷歌翻译
从测试阶段的单个初始示例跟踪视觉对象已被广泛地作为一个/几次射击问题,即初始适应的一次性学习和在线适应的少量学习。近期几次拍摄的在线适应方法通过在离线阶段的复杂元学习优化中,从大量注释的训练数据中纳入了现有知识。这有助于在线深度跟踪器实现快速适应并降低跟踪的过度风险。在本文中,我们提出了一个简单但有效的递归最小二乘估计估计者辅助在线学习方法,但在不需要离线培训的情况下进行了几次拍摄的在线适应。它允许内置的内存保留机制进行模型,以记住关于之前看到的对象的知识,因此可以安全地从训练中安全地移除所看到的数据。这也与在防止灾难性遗忘的新出现的连续学习领域带有某些相似之处。这种机制使我们能够揭示现代在线深度跟踪器的力量,而不会产生过多的计算成本。我们根据在线学习家庭中的两个网络评估我们的方法,即在RT-MDNET中的多层的rceptrons和DIMP中的卷积神经网络。对若干具有挑战性的跟踪基准的一致性改进展示了其有效性和效率。
translated by 谷歌翻译
在本文中,我们研究了在深网(DNS)中修剪的重要性,以及(1)修剪高度参数的DNS之间的Yin&Yang关系,这些DNS已从随机初始化训练,并且(2)培训“巧妙”的小型DNS,这些DNS已“巧妙”。初始化。在大多数情况下,从业者只能诉诸随机初始化,因此强烈需要对DN修剪建立扎实的理解。当前的文献在很大程度上仍然是经验的,缺乏对修剪如何影响DNS决策边界,如何解释修剪以及如何设计相应的原则修剪技术的理论理解。为了解决这些问题,我们建议在连续分段仿射(CPA)DNS的理论分析中采用最新进展。从这个角度来看,我们将能够检测到早期的鸟类(EB)票务现象,为当前的修剪技术提供可解释性,并制定有原则的修剪策略。在研究的每个步骤中,我们进行了广泛的实验,以支持我们的主张和结果;尽管我们的主要目标是增强对DN修剪的当前理解,而不是开发一种新的修剪方法,但我们的样条修剪标准在层和全球修剪方面与先进的修剪方法相当甚至超过了。
translated by 谷歌翻译
Are large language models (LLMs) like GPT-3 psychologically safe? In this work, we design unbiased prompts to evaluate LLMs systematically from a psychological perspective. Firstly, we test the personality traits of three different LLMs with Short Dark Triad (SD-3) and Big Five Inventory (BFI). We find all of them show higher scores on SD-3 than the human average, indicating a relatively darker personality. Furthermore, LLMs like InstructGPT and FLAN-T5, which are fine-tuned with safety metrics, do not necessarily have more positive personalities. They score higher on Machiavellianism and Narcissism than GPT-3. Secondly, we test the LLMs in GPT-3 series on well-being tests to study the impact of fine-tuning with more training data. Interestingly, we observe a continuous increase in well-being scores from GPT-3 to InstructGPT. Following the observations, we show that instruction-finetune FLAN-T5 with positive answers in BFI can effectively improve the model from a psychological perspective. Finally, we call on the community to evaluate and improve LLMs' safety systematically instead of at the sentence level only.
translated by 谷歌翻译
Modeling noise transition matrix is a kind of promising method for learning with label noise. Based on the estimated noise transition matrix and the noisy posterior probabilities, the clean posterior probabilities, which are jointly called Label Distribution (LD) in this paper, can be calculated as the supervision. To reliably estimate the noise transition matrix, some methods assume that anchor points are available during training. Nonetheless, if anchor points are invalid, the noise transition matrix might be poorly learned, resulting in poor performance. Consequently, other methods treat reliable data points, extracted from training data, as pseudo anchor points. However, from a statistical point of view, the noise transition matrix can be inferred from data with noisy labels under the clean-label-domination assumption. Therefore, we aim to estimate the noise transition matrix without (pseudo) anchor points. There is evidence showing that samples are more likely to be mislabeled as other similar class labels, which means the mislabeling probability is highly correlated with the inter-class correlation. Inspired by this observation, we propose an instance-specific Label Distribution Regularization (LDR), in which the instance-specific LD is estimated as the supervision, to prevent DCNNs from memorizing noisy labels. Specifically, we estimate the noisy posterior under the supervision of noisy labels, and approximate the batch-level noise transition matrix by estimating the inter-class correlation matrix with neither anchor points nor pseudo anchor points. Experimental results on two synthetic noisy datasets and two real-world noisy datasets demonstrate that our LDR outperforms existing methods.
translated by 谷歌翻译
Data heterogeneity across clients in federated learning (FL) settings is a widely acknowledged challenge. In response, personalized federated learning (PFL) emerged as a framework to curate local models for clients' tasks. In PFL, a common strategy is to develop local and global models jointly - the global model (for generalization) informs the local models, and the local models (for personalization) are aggregated to update the global model. A key observation is that if we can improve the generalization ability of local models, then we can improve the generalization of global models, which in turn builds better personalized models. In this work, we consider class imbalance, an overlooked type of data heterogeneity, in the classification setting. We propose FedNH, a novel method that improves the local models' performance for both personalization and generalization by combining the uniformity and semantics of class prototypes. FedNH initially distributes class prototypes uniformly in the latent space and smoothly infuses the class semantics into class prototypes. We show that imposing uniformity helps to combat prototype collapse while infusing class semantics improves local models. Extensive experiments were conducted on popular classification datasets under the cross-device setting. Our results demonstrate the effectiveness and stability of our method over recent works.
translated by 谷歌翻译
Air pollution is a crucial issue affecting human health and livelihoods, as well as one of the barriers to economic and social growth. Forecasting air quality has become an increasingly important endeavor with significant social impacts, especially in emerging countries like China. In this paper, we present a novel Transformer architecture termed AirFormer to collectively predict nationwide air quality in China, with an unprecedented fine spatial granularity covering thousands of locations. AirFormer decouples the learning process into two stages -- 1) a bottom-up deterministic stage that contains two new types of self-attention mechanisms to efficiently learn spatio-temporal representations; 2) a top-down stochastic stage with latent variables to capture the intrinsic uncertainty of air quality data. We evaluate AirFormer with 4-year data from 1,085 stations in the Chinese Mainland. Compared to the state-of-the-art model, AirFormer reduces prediction errors by 5%~8% on 72-hour future predictions. Our source code is available at https://github.com/yoshall/airformer.
translated by 谷歌翻译